kl property
Convergence of Cubic Regularization for Nonconvex Optimization under KL Property
Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. However, existing understandings of convergence rate of CR are conditioned on special types of geometrical properties of the objective function. In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KL property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KL property.
Convergence of Cubic Regularization for Nonconvex Optimization under KL Property
Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems. However, existing understandings of convergence rate of CR are conditioned on special types of geometrical properties of the objective function. In this paper, we explore the asymptotic convergence rate of CR by exploiting the ubiquitous Kurdyka-Lojasiewicz (KL) property of the nonconvex objective functions. In specific, we characterize the asymptotic convergence rate of various types of optimality measures for CR including function value gap, variable distance gap, gradient norm and least eigenvalue of the Hessian matrix. Our results fully characterize the diverse convergence behaviors of these optimality measures in the full parameter regime of the KL property. Moreover, we show that the obtained asymptotic convergence rates of CR are order-wise faster than those of first-order gradient descent algorithms under the KL property.
Accelerated Proximal Gradient Methods for Nonconvex Programming
Nonconvex and nonsmooth problems have recently received considerable attention in signal/image processing, statistics and machine learning. However, solving the nonconvex and nonsmooth optimization problems remains a big challenge. Accelerated proximal gradient (APG) is an excellent method for convex programming. However, it is still unknown whether the usual APG can ensure the convergence to a critical point in nonconvex programming. In this paper, we extend APG for general nonconvex and nonsmooth programs by introducing a monitor that satisfies the sufficient descent property. Accordingly, we propose a monotone APG and a nonmonotone APG. The latter waives the requirement on monotonic reduction of the objective function and needs less computation in each iteration.
Enhancing Convergence of Decentralized Gradient Tracking under the KL Property
Chen, Xiaokai, Cao, Tianyu, Scutari, Gesualdo
We study decentralized multiagent optimization over networks, modeled as undirected graphs. The optimization problem consists of minimizing a nonconvex smooth function plus a convex extended-value function, which enforces constraints or extra structure on the solution (e.g., sparsity, low-rank). We further assume that the objective function satisfies the Kurdyka-{\L}ojasiewicz (KL) property, with given exponent $\theta\in [0,1)$. The KL property is satisfied by several (nonconvex) functions of practical interest, e.g., arising from machine learning applications; in the centralized setting, it permits to achieve strong convergence guarantees. Here we establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA. Specifically, $\textbf{(i)}$ when $\theta\in (0,1/2]$, the sequence generated by SONATA converges to a stationary solution of the problem at R-linear rate;$ \textbf{(ii)} $when $\theta\in (1/2,1)$, sublinear rate is certified; and finally $\textbf{(iii)}$ when $\theta=0$, the iterates will either converge in a finite number of steps or converges at R-linear rate. This matches the convergence behavior of centralized proximal-gradient algorithms except when $\theta=0$. Numerical results validate our theoretical findings.
Reviews: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property
The paper investigates the convergence of different measurement of cubic regularization method for non-convex optimization under KL property. It consists with a list of work on CR methods based on the analysis of Nesterove el.s' work. Since the type of methods can guarantee the convergence to the second-order stationary point, it is quite popular also considering the raising of training neural networks. The paper is well-written, clear-organized and the theorems and proofs are easy to follow. Note that this is a pure theoretical work i.e., without new algorithms and/or numerical experiments.
Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization
Wang, Hao, Wang, Ye, Yang, Xiangyu
This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enabling the provable identification of the "correct" rank of the stationary point within a finite number of iterations. Secondly, we introduce an adaptive updating strategy for smoothing parameters. This strategy automatically fixes parameters associated with zero singular values as constants upon detecting the "correct" rank while quickly driving the rest of the parameters to zero. This adaptive behavior transforms the algorithm into one that effectively solves smooth problems after a few iterations, setting our work apart from existing iteratively reweighted methods for low-rank optimization. We prove the global convergence of the proposed algorithm, guaranteeing that every limit point of the iterates is a critical point. Furthermore, a local convergence rate analysis is provided under the Kurdyka-{\L}ojasiewicz property. We conduct numerical experiments using both synthetic and real data to showcase our algorithm's efficiency and superiority over existing methods.
A majorized PAM method with subspace correction for low-rank composite factorization model
Tao, Ting, Qian, Yitian, Pan, Shaohua
This paper concerns a class of low-rank composite factorization models arising from matrix completion. For this nonconvex and nonsmooth optimization problem, we propose a proximal alternating minimization algorithm (PAMA) with subspace correction, in which a subspace correction step is imposed on every proximal subproblem so as to guarantee that the corrected proximal subproblem has a closed-form solution. For this subspace correction PAMA, we prove the subsequence convergence of the iterate sequence, and establish the convergence of the whole iterate sequence and the column subspace sequences of factor pairs under the KL property of objective function and a restrictive condition that holds automatically for the column $\ell_{2,0}$-norm function. Numerical comparison with the proximal alternating linearized minimization method on one-bit matrix completion problems indicates that PAMA has an advantage in seeking lower relative error within less time.
Accelerated Proximal Gradient Methods for Nonconvex Programming
Nonconvex and nonsmooth problems have recently received considerable attention in signal/image processing, statistics and machine learning. However, solving the nonconvex and nonsmooth optimization problems remains a big challenge. Accelerated proximal gradient (APG) is an excellent method for convex programming. However, it is still unknown whether the usual APG can ensure the convergence to a critical point in nonconvex programming. In this paper, we extend APG for general nonconvex and nonsmooth programs by introducing a monitor that satisfies the sufficient descent property. Accordingly, we propose a monotone APG and a nonmonotone APG. The latter waives the requirement on monotonic reduction of the objective function and needs less computation in each iteration.
An inexact LPA for DC composite optimization and application to matrix completions with outliers
Tao, Ting, Liu, Ruyu, Pan, Shaohua
This paper concerns a class of DC composite optimization problems which, as an extension of convex composite optimization problems and DC programs with nonsmooth components, often arises in robust factorization models of low-rank matrix recovery. For this class of nonconvex and nonsmooth problems, we propose an inexact linearized proximal algorithm (iLPA) by computing in each step an inexact minimizer of a strongly convex majorization constructed with a partial linearization of their objective functions at the current iterate, and establish the convergence of the generated iterate sequence under the Kurdyka-\L\"ojasiewicz (KL) property of a potential function. In particular, by leveraging the composite structure, we provide a verifiable condition for the potential function to have the KL property of exponent $1/2$ at the limit point, so for the iterate sequence to have a local R-linear convergence rate. Finally, we apply the proposed iLPA to a robust factorization model for matrix completions with outliers and non-uniform sampling, and numerical comparison with the Polyak subgradient method confirms its superiority in terms of computing time and quality of solutions.